In [1]:
using Pkg
Pkg.activate(pwd())
  Activating environment at `~/Documents/github.com/ucla-biostat216-2021fall.github.io/slides/03-matrix/Project.toml`
In [2]:
using DSP, ForwardDiff, GraphPlot, ImageCore, ImageFiltering, ImageIO, ImageShow, LightGraphs
using LinearAlgebra, Plots, Polynomials, Random, SparseArrays, SpecialMatrices
using Symbolics, TestImages, UnicodePlots
Random.seed!(216)
Out[2]:
MersenneTwister(216)

Matrices (BV Chapters 6-10)

Notations and terminology

  • A matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ $$ \mathbf{A} = \begin{pmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{pmatrix}. $$

    • $a_{ij}$ are the matrix elements or entries or coefficients or components.
    • Size or dimension of the matrix is $m \times n$.
  • Many authors use square brackets: $$ \mathbf{A} = \begin{bmatrix} a_{11} & \cdots & a_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} & \cdots & a_{mn} \end{bmatrix}. $$

In [3]:
# a 3x4 matrix
A = [0 1 -2.3 0.1;
    1.3 4 -0.1 0;
    4.1 -1 0 1.7]
Out[3]:
3×4 Matrix{Float64}:
 0.0   1.0  -2.3  0.1
 1.3   4.0  -0.1  0.0
 4.1  -1.0   0.0  1.7
  • We say a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is
    • tall if $m > n$;
    • wide if $m < n$;
    • square if $m = n$
In [4]:
# a tall matrix
A = randn(3, 2)
Out[4]:
3×2 Matrix{Float64}:
 -1.28506   -0.326449
 -1.44549    1.8623
 -0.244914   0.0582264
In [5]:
# a wide matrix
A = randn(2, 3)
Out[5]:
2×3 Matrix{Float64}:
 -0.135717  -2.23444   0.854177
 -0.8972     0.915744  0.576381
In [6]:
# a square matrix
A = randn(3, 3)
Out[6]:
3×3 Matrix{Float64}:
 -0.0627216   0.848249  0.336559
  0.668146   -0.358689  0.321647
 -0.0148676  -1.07614   0.183508
  • Block matrix is a rectangular array of matrices $$ \mathbf{A} = \begin{pmatrix} \mathbf{B} & \mathbf{C} \\ \mathbf{D} & \mathbf{E} \end{pmatrix}. $$ Elements in the block matrices are the blocks or submatrices. Dimensions of the blocks must be compatible.
In [7]:
# blocks
B = [2; 1]
C = [0 2 3; 5 4 7]
D = [1]
E = [-1 6 0]
# block matrix
A = [B C; D E]
Out[7]:
3×4 Matrix{Int64}:
 2   0  2  3
 1   5  4  7
 1  -1  6  0
  • Matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ can be viewed as a $1 \times n$ block matrix with each column as a block $$ \mathbf{A} = \begin{pmatrix} \mathbf{a}_1 & \cdots & \mathbf{a}_n \end{pmatrix}, $$ where $$ \mathbf{a}_j = \begin{pmatrix} a_{1j} \\ \vdots \\ a_{mj} \end{pmatrix} $$ is the $j$-th column of $\mathbf{A}$.

  • $\mathbf{A} \in \mathbb{R}^{m \times n}$ viewed as an $m \times 1$ block matrix with each row as a block $$ \mathbf{A} = \begin{pmatrix} \mathbf{b}_1' \\ \vdots \\ \mathbf{b}_m' \end{pmatrix}, $$ where $$ \mathbf{b}_i' = \begin{pmatrix} a_{i1} \, \ldots \, a_{in} \end{pmatrix} $$ is the $i$-th row of $\mathbf{A}$.

  • Submatrix: $\mathbf{A}_{i_1:i_2,j_1:j_2}$.
In [8]:
A = randn(5, 3)
Out[8]:
5×3 Matrix{Float64}:
 -1.46013    -0.159054  -0.555663
  0.0341577   1.06857    0.794304
 -1.38244    -1.38077    1.26949
  0.527643    1.90148    0.333068
  0.277008    0.853046   1.66378
In [9]:
# sub-array
A[2:3, 2:3]
Out[9]:
2×2 Matrix{Float64}:
  1.06857  0.794304
 -1.38077  1.26949
In [10]:
# a row
A[4, :]
Out[10]:
3-element Vector{Float64}:
 0.5276425490425348
 1.901483549649674
 0.3330681151463122
In [11]:
# a column
A[:, 2]
Out[11]:
5-element Vector{Float64}:
 -0.15905405454144597
  1.0685663823117286
 -1.3807742942742485
  1.901483549649674
  0.8530463796675399

Some special matrices

  • Zero matrix $\mathbf{0}_{m \times n} \in \mathbb{R}^{m \times n}$ is the matrix with all entries $a_{ij}=0$. The subscript is often omitted when the dimension is clear from context.

  • One matrix $\mathbf{1}_{m \times n} \in \mathbb{R}^{m \times n}$ is the matrix with all entries $a_{ij}=1$. It is often written as $\mathbf{1}_m \mathbf{1}_n'$.

  • Identity matrix $\mathbf{I}_n \in \mathbb{R}^{n \times n}$ has entries $a_{ij} = \delta_{ij}$ (Kronecker delta notation): $$ \mathbf{I}_n = \begin{pmatrix} 1 & \cdots & 0 \\ \vdots & \ddots & \vdots \\ 0 & \cdots & 1 \end{pmatrix}. $$ The subscript is often omitted when the dimension is clear from context. The columns of $\mathbf{A}$ are the unit vectors $$ \mathbf{I}_n = \begin{pmatrix} \mathbf{e}_1 \, \mathbf{e}_2 \, \ldots \, \mathbf{e}_n \end{pmatrix}. $$ Identity matrix is called the uniform scaling in some computer languages.

In [12]:
# a 3x5 zero matrix
zeros(3, 5)
Out[12]:
3×5 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0
In [13]:
# uniform scaling
I
Out[13]:
UniformScaling{Bool}
true*I
In [14]:
# a 3x3 identity matrix
I(3)
Out[14]:
3×3 Diagonal{Bool, Vector{Bool}}:
 1  ⋅  ⋅
 ⋅  1  ⋅
 ⋅  ⋅  1
In [15]:
# convert to dense matrix
Matrix(I(3))
Out[15]:
3×3 Matrix{Bool}:
 1  0  0
 0  1  0
 0  0  1
  • Symmetric matrix is a square matrix $\mathbf{A}$ with $a_{ij} = a_{ji}$.
In [16]:
# a symmetric matrix
A = [4 3 -2; 3 -1 5; -2 5 0]
Out[16]:
3×3 Matrix{Int64}:
  4   3  -2
  3  -1   5
 -2   5   0
In [17]:
issymmetric(A)
Out[17]:
true
In [18]:
# turn a general square matrix into a symmetric matrix
A = randn(3, 3)
Out[18]:
3×3 Matrix{Float64}:
 -0.0676266   0.0104737  -0.904305
 -0.474861   -0.254432    0.369159
  0.0561321  -0.529029    0.519145
In [19]:
# use upper triangular part as data
Symmetric(A)
Out[19]:
3×3 Symmetric{Float64, Matrix{Float64}}:
 -0.0676266   0.0104737  -0.904305
  0.0104737  -0.254432    0.369159
 -0.904305    0.369159    0.519145
In [20]:
# use lower triangular part as data
Symmetric(A, :L)
Out[20]:
3×3 Symmetric{Float64, Matrix{Float64}}:
 -0.0676266  -0.474861   0.0561321
 -0.474861   -0.254432  -0.529029
  0.0561321  -0.529029   0.519145
  • A diagonal matrix is a square matrix $\mathbf{A}$ with $a_{ij} = 0$ for all $i \ne j$.
In [21]:
# a diagonal matrix
A = [-1 0 0; 0 2 0; 0 0 -5]
Out[21]:
3×3 Matrix{Int64}:
 -1  0   0
  0  2   0
  0  0  -5
In [22]:
# turn a general square matrix into a diagonal matrix
Diagonal(A)
Out[22]:
3×3 Diagonal{Int64, Vector{Int64}}:
 -1  ⋅   ⋅
  ⋅  2   ⋅
  ⋅  ⋅  -5
  • A lower triangular matrix is a square matrix $\mathbf{A}$ with $a_{ij} = 0$ for all $i < j$.

  • A upper triangular matrix is a square matrix $\mathbf{A}$ with $a_{ij} = 0$ for all $i > j$.

In [23]:
# a lower triangular matrix
A = [4 0 0; 3 -1 0; -1 5 -2]
Out[23]:
3×3 Matrix{Int64}:
  4   0   0
  3  -1   0
 -1   5  -2
In [24]:
# turn a general square matrix into a lower triangular matrix
A = randn(3, 3)
LowerTriangular(A)
Out[24]:
3×3 LowerTriangular{Float64, Matrix{Float64}}:
  0.0366214    ⋅           ⋅ 
 -0.516973    0.0865552    ⋅ 
  0.0669796  -0.430237   -0.0540081
In [25]:
# turn a general square matrix into an upper triangular matrix
UpperTriangular(A)
Out[25]:
3×3 UpperTriangular{Float64, Matrix{Float64}}:
 0.0366214  0.0965049   1.74975
  ⋅         0.0865552  -0.129446
  ⋅          ⋅         -0.0540081
  • A unit lower triangular matrix is a square matrix $\mathbf{A}$ with $a_{ij} = 0$ for all $i < j$ and $a_{ii}=1$.

  • A unit upper triangular matrix is a square matrix $\mathbf{A}$ with $a_{ij} = 0$ for all $i > j$ and $a_{ii}=1$.

In [26]:
# turn a general square matrix into a unit lower triangular matrix
UnitLowerTriangular(A)
Out[26]:
3×3 UnitLowerTriangular{Float64, Matrix{Float64}}:
  1.0          ⋅         ⋅ 
 -0.516973    1.0        ⋅ 
  0.0669796  -0.430237  1.0
In [27]:
# turn a general square matrix into a unit upper triangular matrix
UnitUpperTriangular(A)
Out[27]:
3×3 UnitUpperTriangular{Float64, Matrix{Float64}}:
 1.0  0.0965049   1.74975
  ⋅   1.0        -0.129446
  ⋅    ⋅          1.0
  • We say a matrix is sparse if most (almost all) of its elements are zero.

    • In computer, a sparse matrix only stores the non-zero entries and their positions.

    • Special algorithms exploit the sparsity. Efficiency depends on number of nonzeros and their positions.

    • Sparse matrix can be visualized by the spy plot.

In [28]:
# generate a random 50x120 sparse matrix, with sparsity level 0.05
A = sprandn(50, 120, 0.05)
Out[28]:
50×120 SparseMatrixCSC{Float64, Int64} with 288 stored entries:
⣀⠀⠀⠒⡀⠠⢀⠀⠠⠈⠀⠀⡀⠀⠐⠘⠀⢀⠀⢀⠀⡁⡂⠀⡈⠀⠂⠀⠀⡠⠀⠂⠀⠀⠀⠀⠀⡀⠚⠀
⠢⠐⠀⠀⢄⠀⠈⠀⠐⠀⠑⠐⠀⠀⢀⠐⣀⠈⢀⠀⡈⠀⠁⠂⠀⡐⠁⠀⠀⢄⠤⠁⠀⡀⠃⡀⠀⠀⠈⣈
⠈⠀⢀⢠⠠⠉⠀⠀⠁⠀⠴⠠⠀⠀⠀⠉⠀⠈⠰⠀⠢⠀⠀⠀⠀⠀⠀⠉⠀⠀⠠⠁⠂⠀⠀⠅⢅⠄⠐⠈
⠠⠀⠠⠨⠀⠁⢀⢀⠈⠄⠀⢀⠀⠀⠤⢀⠈⡀⠀⠀⠀⠀⠁⠂⢀⡂⠁⠀⠀⠀⠰⠭⡀⡀⢀⠂⠁⠈⠀⠀
⠀⠀⡀⠀⠠⠀⠁⠨⠈⠀⠀⢐⣠⠐⠀⠀⠄⠀⠙⠀⠘⠀⠀⢀⠆⠀⠀⠀⠅⠃⠅⢁⠂⠃⠀⠃⠀⠡⠠⠄
⠀⠀⠈⢠⡀⠀⡨⠊⠄⠀⠀⠈⠈⠂⠀⢀⠀⠠⠀⠈⠀⠤⠀⠂⠠⢀⠀⠁⠈⠁⠄⠀⡀⠄⠀⠡⠀⠅⡄⠀
⠀⠄⠄⠈⡀⠁⠐⠡⠈⠀⠰⠢⠒⢁⠂⠂⠀⠀⠈⢁⡀⠃⢀⠀⡀⠉⢂⠁⡄⢀⠂⠂⡀⡀⠀⠁⠂⠀⠈⠀
⠀⠀⠀⠀⠀⠂⠠⡀⠂⠁⠐⠂⢀⠠⠀⠂⠠⠀⡂⠀⡀⠂⠀⠂⡀⠀⡁⠀⠄⠂⠂⠀⠀⢐⠀⠀⠀⠄⠂⠀
⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⠈⠀⠀⠀⠀⠀⠀⠀⠀⠀⠁⠀⠁⠀⠀⠀⠀⠀
In [29]:
# show contents of SparseMatrixCSC{Bool, Int64}
dump(A)
SparseMatrixCSC{Float64, Int64}
  m: Int64 50
  n: Int64 120
  colptr: Array{Int64}((121,)) [1, 3, 5, 7, 8, 9, 9, 11, 12, 14  …  264, 268, 271, 273, 275, 278, 284, 285, 286, 289]
  rowval: Array{Int64}((288,)) [6, 8, 10, 22, 5, 14, 41, 8, 30, 41  …  2, 7, 15, 28, 38, 11, 29, 7, 11, 13]
  nzval: Array{Float64}((288,)) [2.335044134794315, -1.393559434162859, -2.1759100026640312, 1.8982960128801192, 0.7880030395039885, -0.9337764537577509, 1.1910513644580212, -0.43639474422789987, -0.8254307951217316, 0.9613906383118074  …  -0.01753544938721602, 1.6253141153582433, -0.6449321806678313, 0.18222465974617355, -0.7109845579830814, 1.1433097964370182, -0.3394250591410285, -0.698448958216769, -0.21833570912646233, -0.4329714009878216]
In [30]:
spy(A)
WARNING: both UnicodePlots and Plots export "spy"; uses of it in module Main must be qualified
UndefVarError: spy not defined

Stacktrace:
 [1] top-level scope
   @ In[30]:1
 [2] eval
   @ ./boot.jl:360 [inlined]
 [3] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094

Matrix operations

  • Scalar-matrix multiplication or scalar-matrix product: For $\beta \in \mathbb{R}$ and $\mathbf{A} \in \mathbb{R}^{m \times n}$, $$ \beta \mathbf{A} = \begin{pmatrix} \beta a_{11} & \cdots & \beta a_{1n} \\ \vdots & \ddots & \vdots \\ \beta a_{m1} & \cdots & \beta a_{mn} \end{pmatrix}. $$
In [31]:
β = 0.5
A = [0 1 -2.3 0.1;
    1.3 4 -0.1 0;
    4.1 -1 0 1.7]
Out[31]:
3×4 Matrix{Float64}:
 0.0   1.0  -2.3  0.1
 1.3   4.0  -0.1  0.0
 4.1  -1.0   0.0  1.7
In [32]:
β * A
Out[32]:
3×4 Matrix{Float64}:
 0.0    0.5  -1.15  0.05
 0.65   2.0  -0.05  0.0
 2.05  -0.5   0.0   0.85
  • Matrix addition and substraction: For $\mathbf{A}, \mathbf{B} \in \mathbb{R}^{m \times n}$, $$ \mathbf{A} + \mathbf{B} = \begin{pmatrix} a_{11} + b_{11} & \cdots & a_{1n} + b_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} + b_{m1} & \cdots & a_{mn} + b_{mn} \end{pmatrix}, \quad \mathbf{A} - \mathbf{B} = \begin{pmatrix} a_{11} - b_{11} & \cdots & a_{1n} - b_{1n} \\ \vdots & \ddots & \vdots \\ a_{m1} - b_{m1} & \cdots & a_{mn} - b_{mn} \end{pmatrix}. $$
In [33]:
A = [0 1 -2.3 0.1;
    1.3 4 -0.1 0;
    4.1 -1 0 1.7]
B = ones(size(A))
Out[33]:
3×4 Matrix{Float64}:
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0
In [34]:
A + B
Out[34]:
3×4 Matrix{Float64}:
 1.0  2.0  -1.3  1.1
 2.3  5.0   0.9  1.0
 5.1  0.0   1.0  2.7
In [35]:
A - B
Out[35]:
3×4 Matrix{Float64}:
 -1.0   0.0  -3.3  -0.9
  0.3   3.0  -1.1  -1.0
  3.1  -2.0  -1.0   0.7
  • Composition of scalar-matrix product and and matrix addition/substraction is linear combination of matrices of same size $$ \alpha_1 \mathbf{A}_1 + \cdots + \alpha_k \mathbf{A}_k $$ One example of linear combination of matrices is color-image-to-grayscale-image conversion.
In [36]:
rgb_image = testimage("lighthouse")
Out[36]:
In [37]:
# R(ed) channel
channelview(rgb_image)[1, :, :]
Out[37]:
512×768 Array{N0f8,2} with eltype N0f8:
 0.361  0.349  0.345  0.376  0.369  …  0.349  0.341  0.329  0.345  0.349
 0.361  0.361  0.361  0.361  0.361     0.349  0.361  0.357  0.349  0.369
 0.376  0.353  0.361  0.361  0.369     0.361  0.357  0.357  0.349  0.361
 0.365  0.369  0.365  0.361  0.353     0.349  0.345  0.349  0.353  0.345
 0.365  0.376  0.361  0.365  0.365     0.357  0.361  0.361  0.361  0.349
 0.369  0.376  0.365  0.365  0.369  …  0.369  0.369  0.357  0.365  0.365
 0.38   0.365  0.369  0.365  0.365     0.369  0.361  0.361  0.361  0.365
 0.369  0.365  0.353  0.369  0.365     0.357  0.361  0.361  0.349  0.376
 0.357  0.376  0.388  0.369  0.369     0.357  0.361  0.361  0.369  0.373
 0.349  0.376  0.38   0.388  0.369     0.369  0.376  0.373  0.369  0.373
 0.373  0.376  0.376  0.38   0.369  …  0.369  0.376  0.376  0.361  0.353
 0.388  0.369  0.369  0.38   0.38      0.369  0.361  0.361  0.349  0.361
 0.38   0.388  0.376  0.369  0.376     0.361  0.373  0.349  0.353  0.376
 ⋮                                  ⋱                ⋮             
 0.204  0.18   0.184  0.263  0.396  …  0.22   0.161  0.165  0.157  0.157
 0.169  0.157  0.149  0.208  0.278     0.235  0.2    0.176  0.173  0.169
 0.149  0.153  0.176  0.192  0.149     0.243  0.235  0.251  0.251  0.231
 0.153  0.165  0.169  0.239  0.329     0.255  0.227  0.192  0.188  0.176
 0.188  0.255  0.365  0.529  0.557     0.224  0.235  0.204  0.2    0.231
 0.31   0.49   0.584  0.494  0.4    …  0.247  0.22   0.243  0.255  0.255
 0.486  0.537  0.592  0.604  0.62      0.263  0.235  0.251  0.247  0.231
 0.663  0.745  0.843  0.753  0.651     0.204  0.208  0.208  0.239  0.267
 0.702  0.678  0.616  0.541  0.506     0.267  0.271  0.259  0.259  0.267
 0.459  0.427  0.514  0.655  0.635     0.235  0.22   0.192  0.173  0.184
 0.443  0.443  0.486  0.529  0.573  …  0.173  0.165  0.173  0.165  0.18
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0
In [38]:
# G(reen) channel
channelview(rgb_image)[2, :, :]
Out[38]:
512×768 Array{N0f8,2} with eltype N0f8:
 0.486  0.475  0.478  0.514  0.506  …  0.486  0.475  0.463  0.482  0.49
 0.486  0.498  0.494  0.498  0.498     0.482  0.49   0.486  0.482  0.498
 0.514  0.49   0.494  0.494  0.506     0.49   0.482  0.482  0.486  0.498
 0.498  0.506  0.498  0.494  0.49      0.478  0.471  0.475  0.49   0.478
 0.498  0.51   0.494  0.498  0.498     0.482  0.49   0.486  0.498  0.486
 0.502  0.51   0.498  0.498  0.506  …  0.494  0.494  0.482  0.502  0.502
 0.518  0.498  0.506  0.498  0.498     0.494  0.49   0.486  0.498  0.502
 0.502  0.498  0.49   0.506  0.498     0.482  0.486  0.486  0.475  0.502
 0.502  0.51   0.522  0.506  0.506     0.482  0.486  0.486  0.494  0.498
 0.49   0.51   0.518  0.522  0.506     0.49   0.502  0.498  0.494  0.498
 0.518  0.51   0.514  0.518  0.514  …  0.482  0.506  0.506  0.498  0.494
 0.533  0.506  0.506  0.518  0.518     0.494  0.49   0.49   0.486  0.498
 0.525  0.522  0.51   0.502  0.51      0.49   0.498  0.475  0.49   0.514
 ⋮                                  ⋱                ⋮             
 0.169  0.145  0.157  0.224  0.353  …  0.255  0.2    0.192  0.188  0.188
 0.133  0.129  0.118  0.169  0.235     0.282  0.235  0.216  0.2    0.196
 0.122  0.122  0.161  0.153  0.106     0.29   0.282  0.286  0.29   0.271
 0.125  0.137  0.149  0.212  0.294     0.302  0.275  0.227  0.227  0.216
 0.165  0.227  0.349  0.514  0.545     0.263  0.282  0.239  0.235  0.271
 0.298  0.486  0.584  0.494  0.4    …  0.282  0.267  0.29   0.302  0.302
 0.498  0.545  0.612  0.616  0.631     0.298  0.282  0.298  0.306  0.29
 0.682  0.776  0.871  0.773  0.671     0.239  0.255  0.255  0.286  0.31
 0.745  0.722  0.655  0.573  0.533     0.306  0.322  0.31   0.306  0.31
 0.486  0.467  0.553  0.69   0.675     0.275  0.259  0.231  0.212  0.22
 0.467  0.467  0.525  0.565  0.62   …  0.204  0.192  0.204  0.192  0.208
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0
In [39]:
# B(lue) channel
channelview(rgb_image)[3, :, :]
Out[39]:
512×768 Array{N0f8,2} with eltype N0f8:
 0.6    0.588  0.588  0.624  0.616  …  0.537  0.525  0.518  0.525  0.529
 0.6    0.608  0.616  0.608  0.608     0.557  0.576  0.569  0.565  0.58
 0.624  0.608  0.616  0.616  0.616     0.592  0.588  0.596  0.596  0.608
 0.62   0.624  0.62   0.616  0.608     0.58   0.576  0.588  0.6    0.588
 0.627  0.639  0.616  0.62   0.62      0.596  0.592  0.6    0.608  0.596
 0.635  0.631  0.62   0.62   0.624  …  0.596  0.596  0.596  0.612  0.612
 0.635  0.62   0.624  0.62   0.62      0.596  0.592  0.6    0.608  0.612
 0.635  0.62   0.608  0.624  0.62      0.596  0.6    0.6    0.588  0.616
 0.627  0.631  0.631  0.624  0.624     0.596  0.6    0.6    0.596  0.604
 0.62   0.631  0.627  0.643  0.624     0.608  0.616  0.604  0.596  0.604
 0.643  0.631  0.624  0.635  0.631  …  0.592  0.608  0.608  0.596  0.592
 0.659  0.624  0.624  0.635  0.635     0.596  0.592  0.592  0.596  0.608
 0.643  0.651  0.639  0.635  0.631     0.592  0.604  0.588  0.6    0.624
 ⋮                                  ⋱                ⋮             
 0.137  0.125  0.145  0.196  0.31   …  0.29   0.224  0.22   0.204  0.204
 0.118  0.118  0.118  0.153  0.2       0.314  0.271  0.239  0.231  0.224
 0.11   0.122  0.153  0.145  0.082     0.333  0.314  0.325  0.314  0.294
 0.114  0.125  0.133  0.192  0.267     0.333  0.31   0.267  0.251  0.239
 0.141  0.208  0.314  0.478  0.498     0.298  0.314  0.275  0.263  0.294
 0.263  0.447  0.545  0.443  0.349  …  0.318  0.298  0.325  0.333  0.333
 0.443  0.494  0.557  0.561  0.569     0.333  0.314  0.329  0.333  0.318
 0.627  0.725  0.82   0.725  0.624     0.275  0.286  0.286  0.325  0.353
 0.678  0.667  0.612  0.529  0.494     0.341  0.341  0.341  0.349  0.353
 0.447  0.424  0.518  0.659  0.643     0.298  0.282  0.255  0.235  0.247
 0.439  0.439  0.49   0.545  0.592  …  0.22   0.22   0.22   0.2    0.22
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0
In [40]:
gray_image = Gray.(rgb_image)
Out[40]:
In [41]:
channelview(gray_image)
Out[41]:
512×768 reinterpret(reshape, N0f8, ::Array{Gray{N0f8},2}) with eltype N0f8:
 0.463  0.451  0.451  0.486  0.478  …  0.451  0.439  0.427  0.447  0.451
 0.463  0.471  0.467  0.471  0.471     0.451  0.463  0.459  0.451  0.471
 0.486  0.463  0.467  0.467  0.478     0.463  0.459  0.459  0.459  0.471
 0.471  0.478  0.471  0.467  0.463     0.451  0.447  0.451  0.463  0.451
 0.475  0.486  0.467  0.471  0.471     0.459  0.463  0.463  0.471  0.459
 0.478  0.482  0.471  0.471  0.478  …  0.467  0.467  0.459  0.475  0.475
 0.49   0.471  0.478  0.471  0.471     0.467  0.463  0.463  0.471  0.475
 0.478  0.471  0.463  0.478  0.471     0.459  0.463  0.463  0.451  0.478
 0.475  0.482  0.494  0.478  0.478     0.459  0.463  0.463  0.467  0.475
 0.463  0.482  0.49   0.494  0.478     0.467  0.478  0.475  0.467  0.475
 0.49   0.482  0.486  0.49   0.482  …  0.463  0.478  0.478  0.467  0.463
 0.506  0.478  0.478  0.49   0.49      0.467  0.463  0.463  0.459  0.471
 0.494  0.498  0.486  0.478  0.482     0.463  0.475  0.451  0.463  0.486
 ⋮                                  ⋱                ⋮             
 0.176  0.153  0.165  0.231  0.361  …  0.247  0.192  0.188  0.18   0.18
 0.141  0.137  0.125  0.18   0.243     0.271  0.227  0.208  0.196  0.192
 0.129  0.129  0.165  0.165  0.118     0.282  0.271  0.278  0.282  0.263
 0.133  0.145  0.153  0.22   0.302     0.29   0.263  0.22   0.22   0.208
 0.169  0.235  0.349  0.514  0.545     0.255  0.271  0.231  0.227  0.263
 0.298  0.482  0.58   0.49   0.396  …  0.275  0.255  0.278  0.29   0.29
 0.49   0.537  0.6    0.608  0.62      0.29   0.271  0.286  0.29   0.275
 0.671  0.761  0.855  0.761  0.659     0.231  0.243  0.243  0.278  0.302
 0.725  0.702  0.639  0.557  0.522     0.298  0.31   0.298  0.298  0.302
 0.475  0.451  0.537  0.675  0.659     0.267  0.251  0.224  0.204  0.212
 0.455  0.455  0.51   0.553  0.604  …  0.196  0.188  0.196  0.184  0.2
 0.0    0.0    0.0    0.0    0.0       0.0    0.0    0.0    0.0    0.0

The gray image are computed by taking a linear combination of the R(ed), G(green), and B(lue) channels. $$ 0.299 \mathbf{R} + 0.587 \mathbf{G} + 0.114 \mathbf{B} $$ according to Rec. ITU-R BT.601-7.

In [42]:
@show 0.299 * channelview(rgb_image)[1, 1, 1] + 
    0.587 * channelview(rgb_image)[2, 1, 1] + 
    0.114 * channelview(rgb_image)[3, 1, 1]
@show channelview(gray_image)[1, 1];
0.299 * (channelview(rgb_image))[1, 1, 1] + 0.587 * (channelview(rgb_image))[2, 1, 1] + 0.114 * (channelview(rgb_image))[3, 1, 1] = 0.4617176470588235
(channelview(gray_image))[1, 1] = 0.463N0f8
In [43]:
# first pixel
rgb_image[1]
Out[43]:
In [44]:
# first pixel converted to gray scale
Gray{N0f8}(0.299 * rgb_image[1].r + 0.587 * rgb_image[1].g + 0.114 * rgb_image[1].b)
Out[44]:
  • The transpose of a matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ is the $n \times m$ matrix $$ \mathbf{A}' = \begin{pmatrix} a_{11} & \cdots & a_{m1} \\ \vdots & \ddots & \vdots \\ a_{1n} & \cdots & a_{mn} \end{pmatrix}. $$ Alternative notation: $\mathbf{A}^T$, $\mathbf{A}^t$.

    • A symmetric matrix satisfies $\mathbf{A} = \mathbf{A}'$.

    • $(\beta \mathbf{A})' = \beta \mathbf{A}'$.

    • $(\mathbf{A} + \mathbf{B})' = \mathbf{A}' + \mathbf{B}'$.

In [45]:
A
Out[45]:
3×4 Matrix{Float64}:
 0.0   1.0  -2.3  0.1
 1.3   4.0  -0.1  0.0
 4.1  -1.0   0.0  1.7
In [46]:
A'
Out[46]:
4×3 adjoint(::Matrix{Float64}) with eltype Float64:
  0.0   1.3   4.1
  1.0   4.0  -1.0
 -2.3  -0.1   0.0
  0.1   0.0   1.7
In [47]:
# note it's not rotating the picture!
rgb_image'
Out[47]:

Matrix-matrix and matrix-vector products

  • Matrix-matrix multiplication or matrix-matrix product: For $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$, $$ \mathbf{C} = \mathbf{A} \mathbf{B} $$ is the $m \times p$ matrix with entries $$ c_{ij} = a_{i1} b_{1j} + a_{i2} b_{2k} + \cdots + a_{in} b_{nj}. $$ Note the number of columns in $\mathbf{A}$ must be equal to the number of rows in $\mathbf{B}$.

  • Vector inner product $\mathbf{a}' \mathbf{b}$ is a special case of matrix-matrix multiplication with $m=p=1$.

  • View of matrix-matrix multiplication as inner products. $c_{ij}$ is the inner product of the $i$-th row of $\mathbf{A}$ and the $j$-th column of $\mathbf{B}$: $$ \begin{pmatrix} & & \\ & c_{ij} & \\ & & \end{pmatrix} = \begin{pmatrix} & & \

  • & \mathbf{a}_i' & - \ & & \end{pmatrix} \begin{pmatrix} & | & \\ & \mathbf{b}_j & \\ & | & \end{pmatrix}. $$
In [48]:
m, n, p = 3, 2, 4
A = randn(m, n)
Out[48]:
3×2 Matrix{Float64}:
 -0.296707  -0.437574
 -1.53662   -1.88922
 -0.823482   0.737361
In [49]:
B = randn(n, p)
Out[49]:
2×4 Matrix{Float64}:
 0.491077   0.0922716  0.265857   0.960385
 0.18826   -1.49729    0.411452  -2.01491
In [50]:
C = A * B
Out[50]:
3×4 Matrix{Float64}:
 -0.228084   0.627799  -0.258923    0.596718
 -1.11026    2.68693   -1.18585     2.33087
 -0.265577  -1.18003    0.0844601  -2.27657
In [51]:
# check C[2,3] = A[2,:]'B[:,3]
C[2, 3]  A[2, :]'B[:, 3]
Out[51]:
true
  • Matrix-vector multiplication or matrix-vector product are special cases of the matrix-matrix multiplication with $m=1$ or $p=1$.

    • For $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^{n}$, $$ \mathbf{A} \mathbf{b} = \begin{pmatrix}
  • & \mathbf{a}_1' & - \ & \vdots & \
  • & \mathbf{a}_m' & - \end{pmatrix} \mathbf{b} = \begin{pmatrix} \mathbf{a}_1' \mathbf{b} \\ \vdots \\ \mathbf{a}_m' \mathbf{b} \end{pmatrix} \in \mathbb{R}^m. $$ Alternatively, $\mathbf{A} \mathbf{b}$ can be viewed as a linear combination of columns of $\mathbf{A}$ $$ \mathbf{A} \mathbf{b} = \begin{pmatrix} | & & | \\ \mathbf{a}_1 & & \mathbf{a}_n \\ | & & | \end{pmatrix} \begin{pmatrix} b_1 \\ \vdots \\ b_n \end{pmatrix} = b_1 \mathbf{a}_1 + \cdots b_n \mathbf{a}_n. $$

    • For $\mathbf{a} \in \mathbb{R}^n$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$, $$ \mathbf{a}' \mathbf{B} = \mathbf{a}' \begin{pmatrix} | & & | \\ \mathbf{b}_1 & & \mathbf{b}_p \\ | & & | \end{pmatrix} = (\mathbf{a}' \mathbf{b}_1 \, \ldots \, \mathbf{a}'\mathbf{b}_p). $$ Alternatively, $\mathbf{a}' \mathbf{B}$ can be viewed as a linear combination of the rows of $\mathbf{B}$ $$ \mathbf{a}' \mathbf{B} = (a_1 \, \ldots \, a_n) \begin{pmatrix}
  • & \mathbf{b}_1' & - \ & & \
  • & \mathbf{b}_n' & - \end{pmatrix} = a_1 \mathbf{b}_1' + \cdots + a_n \mathbf{b}_n'. $$

  • View of matrix mulplication $\mathbf{C} = \mathbf{A} \mathbf{B}$ as matrix-vector products.

    • $j$-th column of $\mathbf{C}$ is equal to product of $\mathbf{A}$ and $j$-th column of $\mathbf{B}$ $$ \begin{pmatrix} & | & \\ & \mathbf{c}_j & \\ & | & \end{pmatrix} = \mathbf{A} \begin{pmatrix} & | & \\ & \mathbf{b}_j & \\ & | & \end{pmatrix}. $$

    • $i$-th row of $\mathbf{C}$ is equal to product of $i$-th row of $\mathbf{A}$ and $\mathbf{B}$ $$ \begin{pmatrix} & & \

  • & \mathbf{c}_i' & - \ & & \end{pmatrix} = \begin{pmatrix} & & \
  • & \mathbf{a}_i' & - \ & & \end{pmatrix} \mathbf{B}. $$
In [52]:
# check C[:, 2] = A * B[:, 2]
C[:, 2]  A * B[:, 2]
Out[52]:
true
In [53]:
# check C[2, :]' = A[2, :]' * B
# note C[2, :] returns a column vector in Julia!
C[2, :]'  A[2, :]'B
Out[53]:
true

Exercise

Here is a directed graph with 4 nodes and 5 edges.

In [54]:
# a simple directed graph on GS p16
g = SimpleDiGraph(4)
add_edge!(g, 1, 2)
add_edge!(g, 1, 3)
add_edge!(g, 2, 3)
add_edge!(g, 2, 4)
add_edge!(g, 4, 3)
gplot(g, nodelabel=["x1", "x2", "x3", "x4"], edgelabel=["b1", "b2", "b3", "b4", "b5"])
Out[54]:
b1 b2 b3 b4 b5 x1 x2 x3 x4

The adjacency matrix $\mathbf{A}$ has entries \begin{eqnarray*} a_{ij} = \begin{cases} 1 & \text{if node $i$ links to node $j$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*}

In [55]:
# adjacency matrix A
A = convert(Matrix{Int64}, adjacency_matrix(g))
Out[55]:
4×4 Matrix{Int64}:
 0  1  1  0
 0  0  1  1
 0  0  0  0
 0  0  1  0

Give a graph interpretation of $\mathbf{A}^2 = \mathbf{A} \mathbf{A}$, $\mathbf{A}^3 = \mathbf{A} \mathbf{A} \mathbf{A}$, ...

In [56]:
A * A
Out[56]:
4×4 Matrix{Int64}:
 0  0  1  1
 0  0  1  0
 0  0  0  0
 0  0  0  0
In [57]:
A * A * A
Out[57]:
4×4 Matrix{Int64}:
 0  0  1  0
 0  0  0  0
 0  0  0  0
 0  0  0  0
  • Properties of matrix multiplications.

    • Associative: $$ (\mathbf{A} \mathbf{B}) \mathbf{C} = \mathbf{A} (\mathbf{B} \mathbf{C}) = \mathbf{A} \mathbf{B} \mathbf{C}. $$
    • Associative with scalar-matrix multiplication: $$ (\alpha \mathbf{B}) \mathbf{C} = \alpha \mathbf{B} \mathbf{C} = \mathbf{B} (\alpha \mathbf{C}). $$
    • Distributive with sum: $$ (\mathbf{A} + \mathbf{B}) \mathbf{C} = \mathbf{A} \mathbf{C} + \mathbf{B} \mathbf{C}, \quad \mathbf{A} (\mathbf{B} + \mathbf{C}) = \mathbf{A} \mathbf{B} + \mathbf{A} \mathbf{C}. $$
    • Transpose of product: $$ (\mathbf{A} \mathbf{B})' = \mathbf{B}' \mathbf{A}'. $$
    • Not commutative: $$ \mathbf{A} \mathbf{B} \ne \mathbf{B} \mathbf{A} $$ in general. There are exceptions, e.g., $\mathbf{A} \mathbf{I} = \mathbf{I} \mathbf{A}$ for square $\mathbf{A}$.
In [58]:
A = randn(3, 2)
B = randn(2, 4)
A * B
Out[58]:
3×4 Matrix{Float64}:
  1.44831    0.0224851  -2.17109   -1.30445
  3.22625   -0.458544   -1.56929   -1.0179
 -0.365403   0.146925   -0.432406  -0.237292
In [59]:
# dimensions are even incompatible!
B * A
DimensionMismatch("A has dimensions (2,4) but B has dimensions (3,2)")

Stacktrace:
 [1] gemm_wrapper!(C::Matrix{Float64}, tA::Char, tB::Char, A::Matrix{Float64}, B::Matrix{Float64}, _add::LinearAlgebra.MulAddMul{true, true, Bool, Bool})
   @ LinearAlgebra /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:643
 [2] mul!
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:169 [inlined]
 [3] mul!
   @ /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:275 [inlined]
 [4] *(A::Matrix{Float64}, B::Matrix{Float64})
   @ LinearAlgebra /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/matmul.jl:160
 [5] top-level scope
   @ In[59]:2
 [6] eval
   @ ./boot.jl:360 [inlined]
 [7] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1094
  • For two vectors $\mathbf{a} \in \mathbb{R}^m$ and $\mathbf{b} \in \mathbb{R}^n$, the matrix $$ \mathbf{a} \mathbf{b}' = \begin{pmatrix} a_1 \\ \vdots \\ a_m \end{pmatrix} \begin{pmatrix} b_1 \, \cdots \, b_n \end{pmatrix} = \begin{pmatrix} a_1 b_1 & \cdots & a_1 b_n \\ \vdots & & \vdots \\ a_m b_1 & \cdots & a_m b_n \end{pmatrix} $$ is called the outer product of $\mathbf{a}$ and $\mathbf{b}$.

  • View of matrix-matrix product $\mathbf{C} = \mathbf{A} \mathbf{B}$ as sum of outer products: $$ \mathbf{C} = \begin{pmatrix} | & & | \\ \mathbf{a}_1 & \cdots & \mathbf{a}_n \\ | & & | \end{pmatrix} \begin{pmatrix}

  • & \mathbf{b}_1' & - \ & \vdots & \
  • & \mathbf{b}_n' & - \end{pmatrix} = \mathbf{a}_1 \mathbf{b}_1' + \cdots + \mathbf{a}_n \mathbf{b}_n'. $$
In [60]:
m, n, p = 3, 2, 4
A = randn(m, n)
B = randn(n, p)
C = A * B
@show C  A[:, 1] * B[1, :]' + A[:, 2] * B[2, :]'
C ≈ A[:, 1] * (B[1, :])' + A[:, 2] * (B[2, :])' = true
Out[60]:
true
  • Matrix-matrix product in block form: $$ \begin{pmatrix} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{pmatrix} \begin{pmatrix} \mathbf{W} & \mathbf{Y} \\ \mathbf{X} & \mathbf{Z} \end{pmatrix} = \begin{pmatrix} \mathbf{A} \mathbf{W} + \mathbf{B} \mathbf{X} & \mathbf{A} \mathbf{Y} + \mathbf{B} \mathbf{Z} \\ \mathbf{C} \mathbf{W} + \mathbf{D} \mathbf{X} & \mathbf{C} \mathbf{Y} + \mathbf{D} \mathbf{Z} \end{pmatrix}. $$ Submatrices need to have compatible dimensions.
In [61]:
# example

Linear functions and operators

  • The matrix-vector product $\mathbf{y} = \mathbf{A} \mathbf{x}$ can be viewed as a function acting on input $\mathbf{x} \in \mathbb{R}^n$ and outputing $\mathbf{y} \in \mathbb{R}^m$. In this sense, we also say any matrix $\mathbf{A}$ is a linear operator.

  • A function $f: \mathbb{R}^n \mapsto \mathbb{R}^m$ is linear if $$ f(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha f(\mathbf{x}) + \beta f(\mathbf{y}). $$ for all scalars $\alpha, \beta$ and vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$.

  • Definition of linear function implies that the superposition property holds for any linear combination $$ f(\alpha_1 \mathbf{x}_1 + \cdots + \alpha_p \mathbf{x}_p) = \alpha_1 f(\mathbf{x}_1) + \cdots + \alpha_p f(\mathbf{x}_p). $$

  • Any linear function is a matrix-vector product and vice versa.

  • For a fixed matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$, the function $f: \mathbb{R}^n \mapsto \mathbb{R}^m$ defined by $$ f(\mathbf{x}) = \mathbf{A} \mathbf{x} $$ is linear, because $f(\alpha \mathbf{x} + \beta \mathbf{y}) = \mathbf{A} (\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha \mathbf{A} \mathbf{x} + \beta \mathbf{A} \mathbf{y} = \alpha f(\mathbf{x}) + \beta f(\mathbf{y})$.

  • Any linear function is a matrix-vector product because \begin{eqnarray*} f(\mathbf{x}) &=& f(x_1 \mathbf{e}_1 + \cdots + x_n \mathbf{e}_n) \\ &=& x_1 f(\mathbf{e}_1) + \cdots + x_n f(\mathbf{e}_n) \\ &=& \begin{pmatrix} f(\mathbf{e}_1) \, \cdots \, f(\mathbf{e}_n) \end{pmatrix} \mathbf{x}. \end{eqnarray*} Hence $f(\mathbf{x}) = \mathbf{A} \mathbf{x}$ with $\mathbf{A} = \begin{pmatrix} f(\mathbf{e}_1) \, \cdots \, f(\mathbf{e}_n) \end{pmatrix}$.

Permutation

  • Reverser matrix: $$ A = \begin{pmatrix} 0 & \cdots & 1 \\ \vdots & .^{.^{.}} & \vdots \\ 1 & \cdots & 0 \end{pmatrix}, \quad \mathbf{A} \mathbf{x} = \begin{pmatrix} x_n \\ x_{n-1} \\ \vdots \\ x_2 \\ x_1 \end{pmatrix}. $$
In [62]:
n = 5
A = [i + j == n + 1 ? 1 : 0 for i in 1:n, j in 1:n]
Out[62]:
5×5 Matrix{Int64}:
 0  0  0  0  1
 0  0  0  1  0
 0  0  1  0  0
 0  1  0  0  0
 1  0  0  0  0
In [63]:
x = Vector(1:n)
Out[63]:
5-element Vector{Int64}:
 1
 2
 3
 4
 5
In [64]:
A * x
Out[64]:
5-element Vector{Int64}:
 5
 4
 3
 2
 1
  • Circular shift matrix $$ \mathbf{A} = \begin{pmatrix} 0 & 0 & \cdots & 0 & 1 \\ 1 & 0 & \cdots & 0 & 0 \\ 0 & 1 & \cdots & 0 & 0 \\ \vdots & \vdots & \ddots & \vdots & \vdots \\ 0 & 0 & \cdots & 1 & 0 \end{pmatrix}, \quad \mathbf{A} \mathbf{x} = \begin{pmatrix} x_n \\ x_1 \\ x_2 \\ \vdots \\ x_{n-1} \end{pmatrix}. $$
In [65]:
n = 5
A = [mod(i - j, n) == 1 ? 1 : 0 for i in 1:n, j in 1:n]
Out[65]:
5×5 Matrix{Int64}:
 0  0  0  0  1
 1  0  0  0  0
 0  1  0  0  0
 0  0  1  0  0
 0  0  0  1  0
In [66]:
x = Vector(1:n)
Out[66]:
5-element Vector{Int64}:
 1
 2
 3
 4
 5
In [67]:
A * x
Out[67]:
5-element Vector{Int64}:
 5
 1
 2
 3
 4
In [68]:
A * (A * x)
Out[68]:
5-element Vector{Int64}:
 4
 5
 1
 2
 3
  • The reverser and circular shift matrices are examples of permutation matrix.

    A permutation matrix is a square 0-1 matrix with one element 1 in each row and one element 1 in each column.

    Equivalently, a permutation matrix is the identity matrix with columns reordered.

    Equivalently, a permutation matrix is the identity matrix with rows reordered.

    $\mathbf{A} \mathbf{x}$ is a permutation of elements in $\mathbf{x}$.

In [69]:
σ = randperm(n)
Out[69]:
5-element Vector{Int64}:
 1
 4
 2
 3
 5
In [70]:
# permute the rows of identity matrix
P = I(n)[σ, :]
Out[70]:
5×5 SparseMatrixCSC{Bool, Int64} with 5 stored entries:
 1  ⋅  ⋅  ⋅  ⋅
 ⋅  ⋅  ⋅  1  ⋅
 ⋅  1  ⋅  ⋅  ⋅
 ⋅  ⋅  1  ⋅  ⋅
 ⋅  ⋅  ⋅  ⋅  1
In [71]:
x = Vector(1:n)
Out[71]:
5-element Vector{Int64}:
 1
 2
 3
 4
 5
In [72]:
# operator
P * x
Out[72]:
5-element Vector{Int64}:
 1
 4
 2
 3
 5
In [73]:
x[σ]
Out[73]:
5-element Vector{Int64}:
 1
 4
 2
 3
 5

Rotation

  • A rotation matrix in the plane $\mathbb{R}^2$: $$ \mathbf{A} = \begin{pmatrix} \cos \theta & - \sin \theta \\ \sin \theta & \cos \theta \end{pmatrix}. $$ $\mathbf{A} \mathbf{x}$ is $\mathbf{x}$ rotated counterclockwise over an angle $\theta$.
In [74]:
θ = π/4
A = [cos(θ) -sin(θ); sin(θ) cos(θ)]
Out[74]:
2×2 Matrix{Float64}:
 0.707107  -0.707107
 0.707107   0.707107
In [75]:
# rotate counterclockwise 45 degree
x1 = [2, 1]
x2 = A * x1
Out[75]:
2-element Vector{Float64}:
 0.7071067811865477
 2.1213203435596424
In [76]:
# rotate counterclockwise 90 degree
x3 = A * x2
Out[76]:
2-element Vector{Float64}:
 -0.9999999999999997
  2.0
In [77]:
x_vals = [0 0 0; x1[1] x2[1] x3[1]]
y_vals = [0 0 0; x1[2] x2[2] x3[2]]

plot(x_vals, y_vals, arrow = true, color = :blue,
    legend = :none, xlims = (-3, 3), ylims = (-3, 3),
    annotations = [((x1 .+ 0.2)..., "x1"),
                ((x2 .+ 0.2)..., "x2 = A * x1"),
                ((x3 .+ [-1, 0.2])..., "x3 = A * A * x1")],
    xticks = -3:1:3, yticks = -3:1:3,
    framestyle = :origin,
    aspect_ratio = :equal)
Out[77]:
  • Exercise: Given two vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^2$ of same length, how do we find a rotation matrix such that $\mathbf{A} \mathbf{x} = \mathbf{y}$?
In [78]:
x, y = randn(2), randn(2)
x = x / norm(x)
y = y / norm(y)
cosθ = x'y / (norm(x) * norm(y))
Out[78]:
0.046199581446062184
In [79]:
sinθ = sqrt(1 - cosθ^2)
A = [cosθ -sinθ; sinθ cosθ]
Out[79]:
2×2 Matrix{Float64}:
 0.0461996  -0.998932
 0.998932    0.0461996
In [80]:
A * x  y
Out[80]:
false

Projection and reflection

  • Projection on the line through $\mathbf{a}$. $$ \mathbf{y} = \frac{\mathbf{a}'\mathbf{x}}{\|\mathbf{a}\|^2} \mathbf{a} = \mathbf{A} \mathbf{x}, $$ where $$ \mathbf{A} = \frac{1}{\|\mathbf{a}\|^2} \mathbf{a} \mathbf{a}'. $$
  • Reflection with respect to the line through $\mathbf{a}$. $$ \mathbf{z} = \mathbf{x} + 2(\mathbf{y} - \mathbf{x}) = \mathbf{B} \mathbf{x}, $$ with $$ \mathbf{B} = \frac{2}{\|\mathbf{a}\|^2} \mathbf{a} \mathbf{a}' - \mathbf{I}. $$
In [81]:
# TODO
a = [3, 2]
A = a * a' / norm(a)^2
x = [1, 2]
y = A * x
B = 2A - I
z = B * x;
In [82]:
x_vals = [0 0 0; x[1] y[1] z[1]]
y_vals = [0 0 0; x[2] y[2] z[2]]

plt = plot(x_vals, y_vals, arrow = true, color = :blue,
    legend = :none, xlims = (-4, 4), ylims = (-2, 4),
    annotations = [((x .+ 0.2)..., "x"),
                ((y .+ 0.2)..., "y"),
                ((z .+ 0.2)..., "z")],
    xticks = -3:1:3, yticks = -1:1:3,
    framestyle = :origin,
    aspect_ratio = :equal)
plot!(plt, [0, a[1]], [0, a[2]], arrow = true,
    annotations = [((a .+ 0.2)..., "a")])
Out[82]:

Incidence matrix

  • For a directed graph with $m$ nodes and $n$ arcs (directed edges), the node-arc indicence matrix $\mathbf{B} \in \{-1,0,1\}^{m \times n}$ has entries \begin{eqnarray*} b_{ij} = \begin{cases} -1 & \text{if edge $j$ starts at vertex $i$} \\ 1 & \text{if edge $j$ ends at vertex $i$} \\ 0 & \text{otherwise} \end{cases}. \end{eqnarray*}
In [83]:
# a simple directed graph on GS p16
g = SimpleDiGraph(4)
add_edge!(g, 1, 2)
add_edge!(g, 1, 3)
add_edge!(g, 2, 3)
add_edge!(g, 2, 4)
add_edge!(g, 4, 3)
gplot(g, nodelabel=["x1", "x2", "x3", "x4"], edgelabel=["b1", "b2", "b3", "b4", "b5"])
Out[83]:
b1 b2 b3 b4 b5 x1 x2 x3 x4
In [84]:
# incidence matrix B
B = convert(Matrix{Int64}, incidence_matrix(g))
Out[84]:
4×5 Matrix{Int64}:
 -1  -1   0   0   0
  1   0  -1  -1   0
  0   1   1   0   1
  0   0   0   1  -1
  • Kirchhoff's current law: Let $\mathbf{B} \in \mathbb{R}^{m \times n}$ be an incidence matrix and $\mathbf{x}=(x_1 \, \ldots \, x_n)'$ with $x_j$ the current through arc $j$, $$ (\mathbf{B} \mathbf{x})_i = \sum_{\text{arc $j$ enters node $i$}} x_j - \sum_{\text{arc $j$ leaves node $i$}} x_j = \text{total current arriving at node $i$}. $$
In [85]:
@variables x1 x2 x3 x4 x5
B * [x1, x2, x3, x4, x5]
Out[85]:
\begin{equation} \left[ \begin{array}{c} - x1 - x2 \\ x1 - x3 - x4 \\ x2 + x3 + x5 \\ x4 - x5 \\ \end{array} \right] \end{equation}
  • Kirchhoff's voltage law: Let $\mathbf{B} \in \mathbb{R}^{m \times n}$ be an incidence matrix and $\mathbf{y} = (y_1, \ldots, y_m)'$ with $y_i$ the potential at node $i$, $$ (\mathbf{B}' \mathbf{y})_j = y_k - y_l \text{ if edge $j$ goes from node $l$ to node $k$ = negative of the voltage across arc $j$}. $$
In [86]:
@variables y1 y2 y3 y4
B' * [y1, y2, y3, y4]
Out[86]:
\begin{equation} \left[ \begin{array}{c} y2 - y1 \\ y3 - y1 \\ y3 - y2 \\ y4 - y2 \\ y3 - y4 \\ \end{array} \right] \end{equation}

Convolution and filtering

  • The convolution of $\mathbf{a} \in \mathbb{R}^n$ and $\mathbf{b} \in \mathbb{R}^m$ is a vector $\mathbf{c} \in \mathbb{R}^{n+m-1}$ with entries $$ c_k = \sum_{i+j=k+1} a_i b_j. $$ Notation: $\mathbf{c} = \mathbf{a} * \mathbf{b}$. Example: $n=4$, $m=3$, \begin{eqnarray*} c_1 &=& a_1 b_1 \\ c_2 &=& a_1 b_2 + a_2 b_1 \\ c_3 &=& a_1 b_3 + a_2 b_2 + a_3 b_1 \\ c_4 &=& a_2 b_3 + a_3 b_2 + a_4 b_1 \\ c_5 &=& a_3 b_3 + a_4 b_2 \\ c_6 &=& a_4 b_3 \end{eqnarray*}

  • Interpretation: Let $\mathbf{a}$ and $\mathbf{b}$ be the coefficients in polynomials \begin{eqnarray*} p(x) &=& a_1 + a_2 x + \cdots + a_n x^{n-1} \\ q(x) &=& b_1 + b_2 x + \cdots + b_m x^{m-1}, \end{eqnarray*} then $\mathbf{c} = \mathbf{a} * \mathbf{b}$ gives the coefficients of the product polynomial $$ p(x) q(x) = c_1 + c_2 x + \cdots + c_{n+m-1} x^{m+n-2}. $$

In [87]:
n, m = 4, 3
@variables a[1:n] b[1:m]
Out[87]:
2-element Vector{Symbolics.Arr{Num, 1}}:
 a[1:4]
 b[1:3]
In [88]:
p = Polynomial(a)
Out[88]:
a[1] + a[2]∙x + a[3]∙x2 + a[4]∙x3
In [89]:
q = Polynomial(b)
Out[89]:
b[1] + b[2]∙x + b[3]∙x2
In [90]:
p * q
Out[90]:
a[1]*b[1] + a[1]*b[2] + a[2]*b[1]∙x + a[3]*b[1] + a[1]*b[3] + a[2]*b[2]∙x2 + a[4]*b[1] + a[3]*b[2] + a[2]*b[3]∙x3 + a[4]*b[2] + a[3]*b[3]∙x4 + a[4]*b[3]∙x5
In [91]:
coeffs(p * q)
Out[91]:
\begin{equation} \left[ \begin{array}{c} a{_1} b{_1} \\ a{_1} b{_2} + a{_2} b{_1} \\ a{_3} b{_1} + a{_1} b{_3} + a{_2} b{_2} \\ a{_4} b{_1} + a{_3} b{_2} + a{_2} b{_3} \\ a{_4} b{_2} + a{_3} b{_3} \\ a{_4} b{_3} \\ \end{array} \right] \end{equation}
  • Probabilistic interpretation. In probability and statistics, convolution often appears when computing the distribution of the sum of two independent random variables. Let $X$ be a discrete random variable taking value $x \in \{0,1,\ldots,n-1\}$ with probability $p_x$ and $Y$ be another discrete random variable taking values $y \in \{0,1,\ldots,m-1\}$. Assume $X$ is independent of $Y$, then the distribution of $Z = X+Y$ is $$ \mathbb{P}(Z = k + 1) = ... $$
In [92]:
# TODO: X ∼ Bin(n-1, p), Y ∼ Unif(0, m-1)
  • Propoerties of convolution

    • symmetric: $\mathbf{a} * \mathbf{b} = \mathbf{b} * \mathbf{a}$.
    • associative: $(\mathbf{a} * \mathbf{b}) * \mathbf{c} = \mathbf{a} * (\mathbf{b} * \mathbf{c})$.
    • If $\mathbf{a} * \mathbf{b} = \mathbf{0}$, then $\mathbf{a} = \mathbf{0}$ or $\mathbf{b} = \mathbf{0}$.

      These properties follow either from the polynomial interpretation or probabilistic interpretation.

  • $\mathbf{c} = \mathbf{a} * \mathbf{b}$ is a linear function $\mathbf{b}$ if we fixe $\mathbf{a}$.

  • $\mathbf{c} = \mathbf{a} * \mathbf{b}$ is a linear function $\mathbf{a}$ if we fixe $\mathbf{b}$.

  • For $n=4$ and $m=3$,

In [93]:
n, m = 4, 3
@variables a[1:n] b[1:m]
Out[93]:
2-element Vector{Symbolics.Arr{Num, 1}}:
 a[1:4]
 b[1:3]
In [94]:
# Toeplitz matrix corresponding to the vector a
Ta = diagm(6, 3,
     0 => [a[1], a[1], a[1]],
    -1 => [a[2], a[2], a[2]],
    -2 => [a[3], a[3], a[3]],
    -3 => [a[4], a[4], a[4]]
)
Out[94]:
\begin{equation} \left[ \begin{array}{ccc} a{_1} & 0 & 0 \\ a{_2} & a{_1} & 0 \\ a{_3} & a{_2} & a{_1} \\ a{_4} & a{_3} & a{_2} \\ 0 & a{_4} & a{_3} \\ 0 & 0 & a{_4} \\ \end{array} \right] \end{equation}
In [95]:
c = Ta * b
Out[95]:
\begin{equation} \left[ \begin{array}{ccc} a{_1} & 0 & 0 \\ a{_2} & a{_1} & 0 \\ a{_3} & a{_2} & a{_1} \\ a{_4} & a{_3} & a{_2} \\ 0 & a{_4} & a{_3} \\ 0 & 0 & a{_4} \\ \end{array} \right] b \end{equation}
In [96]:
# c = Ta * b
Symbolics.scalarize(c)
Out[96]:
\begin{equation} \left[ \begin{array}{c} a{_1} b{_1} \\ a{_1} b{_2} + a{_2} b{_1} \\ a{_3} b{_1} + a{_1} b{_3} + a{_2} b{_2} \\ a{_2} b{_3} + a{_3} b{_2} + a{_4} b{_1} \\ a{_3} b{_3} + a{_4} b{_2} \\ a{_4} b{_3} \\ \end{array} \right] \end{equation}
In [97]:
# Toeplitz matrix corresponding to the vector b
Tb = diagm(6, 4,
     0 => [b[1], b[1], b[1], b[1]],
    -1 => [b[2], b[2], b[2], b[2]],
    -2 => [b[3], b[3], b[3], b[3]]
)
Out[97]:
\begin{equation} \left[ \begin{array}{cccc} b{_1} & 0 & 0 & 0 \\ b{_2} & b{_1} & 0 & 0 \\ b{_3} & b{_2} & b{_1} & 0 \\ 0 & b{_3} & b{_2} & b{_1} \\ 0 & 0 & b{_3} & b{_2} \\ 0 & 0 & 0 & b{_3} \\ \end{array} \right] \end{equation}
In [98]:
# c = Tb * a
Symbolics.scalarize(Tb * a)
Out[98]:
\begin{equation} \left[ \begin{array}{c} a{_1} b{_1} \\ a{_1} b{_2} + a{_2} b{_1} \\ a{_3} b{_1} + a{_1} b{_3} + a{_2} b{_2} \\ a{_4} b{_1} + a{_3} b{_2} + a{_2} b{_3} \\ a{_4} b{_2} + a{_3} b{_3} \\ a{_4} b{_3} \\ \end{array} \right] \end{equation}
  • The convolution matrices $\mathbf{T}_a$ and $\mathbf{T}_a$ are examples of Toeplitz matrices.

  • When one vector is much longer than the other, for example, $m \ll n$, we can consider the longer vector $\mathbf{a} \in \mathbb{R}^n$ as signal, and apply the inner product of the filter (also called kernel) $(b_m \, \ldots \, b_1)' \in \mathbb{R}^m$ along a sliding window of $\mathbf{a}$.

  • If we apply filter (or kernel) in original order to the sliding window of the signal, it is called correlation instead of convolution. For symmetric filter (or kernel), correlation is same as convolution.

  • Filtering is extensively used in image processing.

In [99]:
img = testimage("mandrill")
Out[99]:
In [100]:
# correlatin with Gaussian kernel
imgg = imfilter(img, Kernel.gaussian(3))
Out[100]:
In [101]:
# Gaussian kernel with σ = 3
Kernel.gaussian(3)
Out[101]:
13×13 OffsetArray(::Matrix{Float64}, -6:6, -6:6) with eltype Float64 with indices -6:6×-6:6:
 0.000343881  0.000633593  0.00104462  …  0.000633593  0.000343881
 0.000633593  0.00116738   0.00192468     0.00116738   0.000633593
 0.00104462   0.00192468   0.00317327     0.00192468   0.00104462
 0.00154117   0.00283956   0.00468165     0.00283956   0.00154117
 0.00203464   0.00374877   0.00618068     0.00374877   0.00203464
 0.00240364   0.00442865   0.00730161  …  0.00442865   0.00240364
 0.00254095   0.00468165   0.00771874     0.00468165   0.00254095
 0.00240364   0.00442865   0.00730161     0.00442865   0.00240364
 0.00203464   0.00374877   0.00618068     0.00374877   0.00203464
 0.00154117   0.00283956   0.00468165     0.00283956   0.00154117
 0.00104462   0.00192468   0.00317327  …  0.00192468   0.00104462
 0.000633593  0.00116738   0.00192468     0.00116738   0.000633593
 0.000343881  0.000633593  0.00104462     0.000633593  0.000343881
In [102]:
# convolution with Gaussian kernel is same as correlation, since Gaussian kernel is symmetric
imgg = imfilter(img, reflect(Kernel.gaussian(3)))
Out[102]:
In [103]:
# Correlation with Laplace kernel
imgl = imfilter(img, Kernel.Laplacian())
Out[103]:

Vandermonde matrix

  • Polynomial of degree $n-1$ or less with coefficients $x_1, x_2, \ldots, x_n$: $$ p(t) = x_1 + x_2 t + x_3 t^2 + \cdots + x_n t^{n-1}. $$ Values of $p(t)$ at $m$ points $t_1, \ldots, t_m$: $$ \begin{pmatrix} p(t_1) \\ p(t_2) \\ \vdots \\ p(t_m) \end{pmatrix} = \begin{pmatrix} 1 & t_1 & \cdots & t_1^{n-1} \\ 1 & t_2 & \cdots & t_2^{n-1} \\ \vdots & \vdots & & \vdots \\ 1 & t_m & \cdots & t_m^{n-1} \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{pmatrix} = \mathbf{A} \mathbf{x}, $$ where the matrix $\mathbf{A}$ is called a Vandermonde matrix. $f(\mathbf{x}) = \mathbf{A} \mathbf{x}$ maps coefficients of polynomial to function values.
In [104]:
t = Vector(1:5)
A = Vandermonde(t)
Out[104]:
5×5 Vandermonde{Int64}:
 1  1   1    1    1
 1  2   4    8   16
 1  3   9   27   81
 1  4  16   64  256
 1  5  25  125  625

Affine matrix

  • A function $f: \mathbb{R}^n \mapsto \mathbb{R}^m$ is affine if $$ f(\alpha \mathbf{x} + \beta \mathbf{y}) = \alpha f(\mathbf{x}) + \beta f(\mathbf{y}). $$ for all vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^n$ and scalars $\alpha, \beta$ with $\alpha + \beta = 1$.

  • Any linear function is affine. Why?

  • Definition of affine function implies that $$ f(\alpha_1 \mathbf{x}_1 + \cdots + \alpha_p \mathbf{x}_p) = \alpha_1 f(\mathbf{x}_1) + \cdots + \alpha_p f(\mathbf{x}_p) $$ for all vectors $\mathbf{x}_1, \ldots, \mathbf{x}_p$ and scalars $\alpha_1, \ldots, \alpha_p$ such that $\alpha_1 + \cdots + \alpha_p = 1$.

  • Any affine function $f: \mathbb{R}^n \mapsto \mathbb{R}^m$ is of the form $\mathbf{A} \mathbf{x} + \mathbf{b}$ for some matrix $\mathbf{A} \in \mathbb{R}^{m \times n}$ and vector $\mathbf{b} \in \mathbb{R}^m$ and vice versa.

  • For fixed $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{b} \in \mathbb{R}^m$, define function $f: \mathbb{R}^n \mapsto \mathbb{R}^m$ by a matrix-vector product plus a constant: $$ f(\mathbf{x}) = \mathbf{A} \mathbf{x} + \mathbf{b}. $$ Then $f$ is an affine function: if $\alpha + \beta = 1$, because $$ f(\alpha \mathbf{x} + \beta \mathbf{y}) = \mathbf{A}(\alpha \mathbf{x} + \beta \mathbf{y}) + \mathbf{b} = \alpha \mathbf{A} \mathbf{x} + \beta \mathbf{A} \mathbf{y} + \alpha \mathbf{b} + \beta \mathbf{b} = \alpha(\mathbf{A} \mathbf{x} + \mathbf{b}) + \beta (\mathbf{A} \mathbf{y} + \mathbf{b}) = \alpha f(\mathbf{x}) + \beta f(\mathbf{y}). $$

  • Any affine function can be written as $f(\mathbf{x}) = \mathbf{A} \mathbf{x} + \mathbf{b}$ with $$ \mathbf{A} = \begin{pmatrix} f(\mathbf{e}_1) - f(\mathbf{0}), f(\mathbf{e}_2) - f(\mathbf{0}), \ldots, f(\mathbf{e}_n) - f(\mathbf{0}) \end{pmatrix} $$ and $\mathbf{b} = f(\mathbf{0})$.

  • Affine approximation: The first-order Taylor approximation of a differentiable function $f: \mathbb{R}^n \mapsto \mathbb{R}^m$ at a point $\mathbf{z}$ is $$ \widehat f(\mathbf{x}) = f(\mathbf{z}) + [\operatorname{D} f(\mathbf{z})] (\mathbf{x} - \mathbf{z}), $$ where $\operatorname{D} f(\mathbf{z})$ is the derivative matrix or Jacobian matrix or differential matrix $$ \operatorname{D} f(\mathbf{z}) = \begin{pmatrix} \frac{\partial f_1}{\partial x_1} (\mathbf{z}) & \frac{\partial f_1}{\partial x_2} (\mathbf{z}) & \cdots & \frac{\partial f_1}{\partial x_n} (\mathbf{z}) \\ \frac{\partial f_2}{\partial x_1} (\mathbf{z}) & \frac{\partial f_2}{\partial x_2} (\mathbf{z}) & \cdots & \frac{\partial f_2}{\partial x_n} (\mathbf{z}) \\ \vdots & \vdots & & \vdots \\ \frac{\partial f_m}{\partial x_1} (\mathbf{z}) & \frac{\partial f_m}{\partial x_2} (\mathbf{z}) & \cdots & \frac{\partial f_m}{\partial x_n} (\mathbf{z}) \end{pmatrix} = \begin{pmatrix} \nabla f_1(\mathbf{z})' \\ \nabla f_2(\mathbf{z})' \\ \vdots \\ \nabla f_m(\mathbf{z})' \end{pmatrix}. $$

  • Example: $$ f(\mathbf{x}) = \begin{pmatrix} f_1(\mathbf{x}) \\ f_2(\mathbf{x}) \end{pmatrix} = \begin{pmatrix} e^{2x_1 + x_2} - x_1 \\ x_1^2 - x_2 \end{pmatrix}. $$ TODO in class: Derivative matrix is $$ \operatorname{D} f(\mathbf{x}) = ??? $$ First-order approximation of $f$ around $\mathbf{z} = \mathbf{0}$ is $$ \widehat f(\mathbf{x}) = ?? + ?? $$

In [105]:
# a nonlinear function
f(x) = [exp(2x[1] + x[2]) - x[1], x[1]^2 - x[2]]
Df0 = ForwardDiff.jacobian(f, [0, 0])
Out[105]:
2×2 Matrix{Float64}:
 1.0   1.0
 0.0  -1.0

Computational complexity

  • Matrix-vector product: $\mathbf{y} = \mathbf{A} \mathbf{x}$ where $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{x} \in \mathbb{R}^n$. ??? flops?

  • Special cases:

    • $\mathbf{A}$ is diagonal, ??? flops.
    • $\mathbf{A}$ is lower triangular, ??? flops.
    • $\mathbf{A}$ is sparse, # flops $\ll 2mn$.
  • Matrix-matrix product: $\mathbf{C} = \mathbf{A} \mathbf{B}$, where $\mathbf{A} \in \mathbb{R}^{m \times n}$ and $\mathbf{B} \in \mathbb{R}^{n \times p}$. ??? flops?
  • Check a matrix is symmetric or not. ??? flops?
  • Exercise: Evaluate $\mathbf{y} = \mathbf{A} \mathbf{B} \mathbf{x}$, where $\mathbf{A}, \mathbf{B} \in \mathbb{R}^{n \times n}$, in two ways

    • $\mathbf{y} = (\mathbf{A} \mathbf{B}) \mathbf{x}$.
    • $\mathbf{y} = \mathbf{A} (\mathbf{B} \mathbf{x})$.
      Which method is faster?
  • Exercise: Evaluate $\mathbf{y} = (\mathbf{I} + \mathbf{u} \mathbf{v}') \mathbf{x}$, where $\mathbf{u}, \mathbf{v}, \mathbf{x} \in \mathbb{R}^n$ in two ways.

    • Evaluate $\mathbf{A} = \mathbf{I} + \mathbf{u} \mathbf{v}'$, then $\mathbf{y} = \mathbf{A} \mathbf{x}$.
    • Evaluate $\mathbf{y} = \mathbf{x} + (\mathbf{v}'\mathbf{x}) \mathbf{u}$. Which method is faster?
In [106]:
# benchmark